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We consider wavelets as a tool to perform a variety of tasks in the context of 
analyzing cosmic microwave background (CMB) maps. Using Spherical Haar Wavelets 
we define a position and angular-scale-dependent measure of power that can be used 
to assess the existence of spatial structure. We apply planar Daubechies wavelets 
for the identification and removal of points sources from small sections of sky maps. 
Our technique can successfully identify virtually all point sources which are above 
3(7 and more than 80% of those above la. We discuss the trade-offs between the 
levels of correct and false detections. We denoise and compress a 100,000 pixel CMB 
map by a factor of ~ 10 in 5 seconds achieving a noise reduction of about 35%. 
In contrast to Wiener filtering the compression process is model independent and 
very fast. We discuss the usefulness of wavelets for power spectrum and cosmological 
parameter estimation. We conclude that at present wavelet functions are most suitable 
for identifying localized sources. 

Key words: cosmic microwave background — methods: statistical. 



1 INTRODUCTION 

The cosmic microwave background (CMB) radiation en- 
codes a vast amount of information about the early universe 
and its subsequent evolution. A number of experiments: 
ground-, balloon- and space-based, are poised to generate 
large data sets from which one hopes to decode this infor- 
mation. Optimal data analysis methods to be utilized with 
these data sets are presently an active area of research. In 
this paper we explore the application of wavelet methods to 
the analysis of CMB data sets. 

Cosmological theories usually model the CMB sky as a 
homogeneous random field on the sphere. Spherical harmon- 
ics arise naturally through the spectral decomposition of the 
field. They are optimal for frequency localization — indeed, 
they define spatial frequencies, £, on the sphere — but they 
do not allow assessment of local features. To represent a 
signal that is nonzero only on a small patch of the sky a 
large number (formally infinite) of spherical harmonics are 
required to obtain all necessary cancellations outside the 
patch. Wavelets, on the other hand, are locally supported 
and therefore, only those supported in the patch are needed 
to represent the signal. 

Wavelets have become an attractive tool for data anal- 



ysis and compression because they are computationaly effi- 
cient and have better time-frequency localization than the 
usual Fourier methods. Although wavelets are usually de- 
fined on the real line, or subsets of R", there have been some 
recent generalizations to other surfaces like the sphere. In 
this paper we use spherical wavelets when dealing with large 
portions of the sky and planar wavelets for small patches. 

Spherical Haar wavelets (SHW) were introduced by 
Sweldens (1995) as a generalization of planar Haar wavelets 
to the pixelized unit sphere. The basic idea of SHW is to 
transform the original map into the sum of a smoothed lower 
resolution map and a set of coefficients encoding the small- 
angular-scale details not captured by the smoothed map. 
The smoothing is done by lowering the resolution of the map 
through a hierarchical pixelization scheme. Thus, pixeliza- 
tions of sky maps automatically yield SHW, and these are 
useful to study data as a function of position and angular 
scale. SHW are not smooth but are nonetheless attractive 
because they can be easily implemented and used as a com- 
putationally efficient exploratory data analysis tool. In Sec- 
tion |^ we present a wavelet formalism on the sphere adapted 
to CMB sky maps. We then use this formalism to compare 
structure in the COBE-DMR maps with predictions of ex- 
pected structure based on model dependent simulations. 
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A common thread throughout the rest of the paper is 
that of denoising a noisy signal. Because wavelet functions 
have excellent localization properties and their 'threshold- 
ing' is the asymptotically best way to remove additive Gaus- 
sian white noise (Donoho 1992), they are useful for both the 
identification of sources embedded in Gaussian noise and 
for suppression of such noise. A further attractive feature 
of wavelet denoising is that the process is computationally 
efficient. In Section^ we use wavelets for the identification 
of point sources embedded in noise. We do not use SHW for 
this particular application because SHW are not smooth and 
thus are not optimal for estimating smooth localized sources. 
We do not know of any computationally efficient smooth 
wavelets on the sphere. For these reasons the algorithm we 
propose is more appropriate for small patches of the sky 
(where the flat sky approximation is appropriate) and we 
use tensor products of one dimensional Daubechies wavelets. 
Wavelet decomposition of CMB maps for the purpose of 
identifying non-Gaussianity has been considered recently by 
Ferreira, Magueijo & Silk (1997), Pando, Valls-Gabaud & 
Fang (1998), and Hobson, Jones & Lasenby (1998). Several 
other groups are concentrating efforts on usage of wavelet 
for foreground identification and removal (e.g., Cayon 1999, 
Sanz et al. 1999). 

Current methods for compressing and denoising CMB 
maps are based on Karhunen-Loeve transformatio ns (Bond 
1995, Bunn & Sugiyama 1995) or Wiener filtering (Bunn et 
al. 199(|). For a sky map with N pixels the computational 
cost of these methods is 0(N 3 ) so their brute force appli- 
cation to future mega-pixel maps (N « 10 6 ) is an unsolved 
(and perhaps unsolvable) computational challenge. Map de- 
noising by wavelet thresholding is significantly more efficient 
taking only 0(N\og(N)) operations. For example, applying 
the wavelet transform, denoising and reconstructing a » 10 5 
pixel map takes less than 6 seconds on an Alpha 500/266. 
Furthermore, wavelet thresholding can be done in a model 
independent way. In Section ^ we present such a technique 
using SHW. 

It has been pointed out repeatedly (e.g., Bond, Jaffe 
& Knox 1998, Borrill 1999, Bond et al. 1999 and references 
therein) that because of the size of future data sets new 
analysis techniques will need to be developed. Techniques 
that can efficiently extract cosmological information from 
mega-pixel sky maps. In Section plwe discuss the usefulness 
of wavelets for this task. SectionM summarizes the results. 
A summary of SHW applied to the COBE pixelization can 
be found in the Appendix. 



2 WAVELET POWER AND CMB SKY MAPS 

A CMB map is a vector T = (Ti) of temperatures T in 
pixels i, located at position £i. We decompose this vector — 
which is also a field on the sky — into its spherical-harmonic 
components, 



Up to normalizing constants, the expected value of the 
RMS 2 of the map is 



T(x) = a lm Y lm (x). 



(1) 



We can then define the power spectrum in the usual way as 
the expectation of the square of the components 



(±2 T ^ «I>£ + i)cv^. 



(3) 



An observation, di, of this (pure-signal) map is the sum of 
the map with some noise component, di — Ti + m, so 



(4) 



(ae m a>t>m.>) = CeS U iS„ 



(2) 



where Na* is the noise correlation matrix, and in particular 
Nu = of is the noise variance of pixel i. 

We now introduce an analogous formalism, expressing 
the map RMS 2 using wavelets. Since wavelets are localized 
both in the spatial and frequency domain, a wavelet expan- 
sion easily leads to partitioning of the RMS 2 into compo- 
nents from different angular scales and from distinct por- 
tions of the map. We introduce the formalism in terms of 
spherical wavelets, most suitable for maps covering large 
portions of the sky. 

Spherical Haar wavelets were introduced by Schroeder 
& Sweldens (1995) as an example of what they call second- 
generation wavelets. These are wavelets which are not dila- 
tions and translations of a mother wavelet and which can 
be defined in spaces more general than R n . As far as we 
know they were the first to introduce computationally effi- 
cient wavelets on the sphere. Sweldens (1995) applied SHW 
and other second-generation spherical wavelets to data com- 
pression. 

The SHW transform of a map 

d = (di) = Xj 

of maximum resolution J is (see Appendix) 
Wd = 7= (A Jo ,7j , • ■ • ,7j_i)*, 

where the vector Xj a is the map at the lowest resolution, 
Jo, and 7j, J < j • < J, is a vector of wavelet coefficients at 
scale j. We define the scale- j power in the wavelet domain 
as a normalized sum of squares of the coefficients at scale j. 
To interpret this measure we show its relation to the RMS 2 
of a map: RMS 2 = £\ iu 2 df/ ^ uf , for chosen weights Wi. 
The wavelet transform (where the weights Wi are implicitely 
included in Aj) yields an angular-scale decomposition of the 
RMS 2 

RMS 2 = -(XjYXj 
w 

= -[(Xj^AjXjo + (7Jo)'Bj 7j 
w 

H h (7/-i)' B ./-i7J-i] 

= RMS 2 o + --- + RMS 2 , (5) 

where w = VI „■ y/wf and Aj a , B j D , B j_i are diagonal 
matrices defined by the wavelet transform (see Appendix). 
Expression (^) separates contributions to the RMS 2 from 
different resolutions. Each RMS 2 corresponds to a window 
function of Ce- Table 1 shows the results of the RMS 2 de- 
composition of some spherical harmonics of different order 
I. The table shows the percentage of the total RMS 2 that 
goes into each scale. As expected, as £ increases a higher per- 
centage of the power goes into the higher scales (i.e., higher 
j). Table 1 also shows that spherical harmonics are not well 
localized in wavelet scale. 
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Table 1. Percentage of the total RMS 2 that goes into each scale 
j for some spherical harmonics of different orders I. The lowest 
and highest resolutions are J a = 

I RMS 2 RMS 2 . RMS 2 



5 and J = 8, respectively. 
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Figure 1. Histograms of the Zj statistics for 1000 simulations of 
beam smoothed, constant power (1(1 + 1)CV = constant) large- 
angular-scale maps, I < 28. The dashed lines correspond to the 
values of Zj for the actual DMR 53+90 GHz DMR. 
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It is useful to construct the standardized sum of squares, 
SSQj = (A7j)'A7j, where A is a diagonal matrix that 
transforms the covariance matrix of the SHW coefficients 
to the identity. Assuming independent Gaussian noise the 
null distribution of SSQj, i.e., its distribution in the absence 
of signal, is a xl- ~ N(kj,2kj), where kj is the subspace 
dimension at scale j. To search for structure in the map at 
scale j we use the statistic 



SSQ, 



(6) 



With uncorrelated noise and known noise RMS the Zj are 
centered at zero if and only if there is no structure at scale 
j, regardless of the Gaussianity of the noise. Also, if there 
is no structure then, by the central limit theorem, the Zj 
axe approximately Gaussian-distributed even with uncorre- 
lated non- Gaussian noise. The distribution of Zj is thus an 
indication of structure. 

We apply our statistic to test for structure in the COBE 
DMR map. Figure ^ shows histograms of the Zj (Eq. ^) for 
1000 simulations of large-angular-scale maps t < 28, with 
£(£+l)C e = (247r/5)Q 2 ms -p S , with Q rms _ PS = 15.3 /kK, 7° 
FWHM beam smoothing and the noise level of the DMR 
53+90 GHz map. The dashed lines in the figure correspond 
to the actual values of Zj for the DMR 53+90 GHz map. 
The values are consistent with those expected from the sim- 
ulations. Both the simulations and the DMR maps have a 
20° galactic cut. SHW are easily adaptable to sky maps with 
Galactic cuts or with any others kinds of gaps. The wavelet 



functions are still orthogonal and the null distributions are 
exactly as before (with smaller kj). We note that with only 
a 10° galactic cut (i.e., with considerable galactic contami- 
nation) the values of the Zj obtained from the DMR maps 
are 71.2, 5.9, and 1.6 for j = 4, 5 and 6, respectively. The 
histograms representing the simulations, which have only a 
CMB signal in them, remain virtually unchanged. As ex- 
pected, more significant structure is detected when some of 
the galactic signal is present, and this structure is not con- 
sistent with a cosmological signal alone. 

At resolution 4 the mean of the simulations and the 
actual map are both considerably displaced from zero. At 
resolution 5 z$ is somewhat less displaced from zero and 
at resolution 6 ze is consistent with zero. This is consistent 
with the beam scale being right around the pixel size at 
resolution 5 so there is structure on that scale and at the 
lower resolution, but no structure at the higher resolution. 



As an aside, note that the covariance matrix S 7 = (77) 



of 7 is 



W(C + Enoi SC )W t 



(7) 



where C and £ no j so are the covariance matrices of the model 
and noise, respectively. In general the matrix A, which is 
used to transform the covariance matrix of A to the identity, 
is 

a = sr 1/2 . (8) 

If we take no correlations from a cosmological model (C = 
0) and the noise is uncorrelated and homoscedastic (i.e., 
Enoise oc I), then S 7 is diagonal. But S 7 is a lot more 
complicated when a cosmological model is assumed. 

Our analysis has not taken full advantage of the local- 
ization properties of wavelets. SHW can easily yield a spatial 
decomposition. First, divide the sky into P disjoint patches. 
Each patch i can be represented as Xj,i by filling with zeros 
those pixels not in patch i. All the Aj,i are orthogonal and 

RMS 2 = i[(A J .i) t Aj, 1 + ■ • • + (Aj.p^Aj.p]. 

w 

An angular-scale decomposition like (0) can now be applied 
to every term. 



3 WAVELET DENOISING OF A MAP 

In the following sections we apply wavelet denoising with 
two different applications in mind. In the first application de- 
noising will be used to identify and remove localized sources 
from a CMB map. In the second application denoising will be 
used to suppress instrumental, or other non-localized noise, 
and compress a map. In both operations we use the same 
denoising technique. The techniques discussed are applica- 
ble to both spherical and flat space wavelets. Both types of 
wavelets will be considered. 

Let di be the measured temperature in pixel i, we write 
di = T S) i + Fi + m, where T s ,i is the underlying tempera- 
ture at pixel i, Fi is a foreground source temperature, and 
m is a noise term describing noise sources which are not 
localized on the sky (e.g., instrumental, atmospheric ). The 
assumptions we make about T S: i depend on whether we re- 
move sources or suppress noise in a map. In the first case 
we assume that T s a and m have zero mean and are uncor- 
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random, drawn from some power spectrum, and we attempt 
to discriminate fixed sources from random realizations of T s 
in our sky. In the second application, when suppressing noise 
in a map, both T s ^ and Fi are assumed fixed. The goal is to 
suppress rii and thus obtain a better estimate of the actual 
realization of the signal in our sky. 

In either case the procedure consists of 'denoising' by 
thresholding the wavelet coefficients of the data and trans- 
forming back. The difference between the two cases arises in 
what is assumed as noise du ring the thresholding stage. This 
will be discussed in Section 3.2 We start by computing the 
wavelet transform of the map. Wavelet coefficients are then 
either set to or shrunk towards zero depending on whether 
their absolute values are above or below a chosen threshold. 
Finally we compute the inverse wavelet transform of the 
thresholded coefficients. We call this procedure 'denoising' 
(see Donoho 1992 for a discussion of this term.) Although 
the procedure is similar to the one often used with Fourier 
coefficients, Donoho (1992) showed that wavelet denoising 
efficiently supresses noise while preserving the sharpness of 
the original signal. This property makes wavelet denoising 
appealing for our applications. 

3.1 Thresholding 

'Thresholding' is a prescription for using the sample wavelet 
coefficients jj, m to estimate the true wavelet coefficients, 
i.e., the wavelet coefficients of the noiseless map. We use 
thresholding functions of the form (Donoho 1992) 



(9) 



C fj.rn ~ Tj if 7j, m > Tj 
5{lj,m) = < if |7j, m | < Tj 

{ 7j,m + t 3 if 7j >m < -Tj, 

where jj, m is a wavelet coefficient of the map and Tj is a 
chosen threshold at scale j. A thresholding scheme corre- 
sponds to a particular prescription for choosing the Tj. The 
7j,m are now replaced by the <5(7j, m ) which are used as the 
"denoised" wavelet coeffecients. 

Given a coarsest resolution J , Donoho & Johnstone 

(1994) suggest keeping the coefficients Xj where the sig- 
nal dominates the noise and thresholding the rest. Wavelet 
thresholding is an active area of research. Comparing dif- 
ferent thresholding methods is outside the scope of this pa- 
per. See Nason (1995) for a survey of thresholding methods. 
We use the SureShrink procedure of Donoho & Johnstone 

(1995) . Ideally one wants to use the threshold that mini- 
mizes the mean square error (MSE) 



possible mean square error of the wavelet coefficients es- 
timates. Under certain conditions minimax estimators are 
posterior Bayesian estimates given a least favorable prior, 
see Lehmann & Casella (1998) for a more complete discus- 
sion regarding the relation between minimax and Bayesian 
estimators. 



3.2 Noise 

Denoising requires information about what constitutes noise 
and what its distribution is. When we use wavelets to iden- 
tify foreground sources, Fi, both the cosmological signal T 3 ,% 
and the noise m are assumed to be random components 
of the noise that contaminates the wavelet coefficients, i.e., 
a 2 — (Tomb + "noise i and the denoised wavelet coeffecients 
are identified with the sources alone. In the second appli- 
cation, when we only suppress noise in a map, only n% is 



considered noise, i.e., a = ov, 



and the signal component 



MSE = <£(J( 7j . 



2 >, 



(10) 



is taken to be the cosmological CMB component, T a ^. These 
values of a are used by SureShrink to compute the estimate 
of the MSE @. 

In this paper we assume that pixel noise is uncorrelated 
and Gaussian. This was approximately the case for the DMR 
maps (Lineweaver et al. 1994) but it may not be a reason- 
able assumption for other experiments. For cases where the 
noise is correlated we point out the following. If the corre- 
lation structure is known then the map can be transformed 
to the uncorrelated case; however, depending on the size 
of the covariance matrix, such a transformation might be 
computationally expensive. Moreover, this transformation is 
not appropriate when searching for localized sources because 
it destroys and widens their shape. More generally, such a 
decorrelating transformation no longer gives a "sky map", 
but a linear combination of sky pixels. Only in the case of 
suitably mild correlations can such a decorrelated dataset 
still be considered a map per se. One should also note that 
for the process of identifying localized sources, when the 
signal T s ,i is considered part of the noise, the cosmological 
correlation of T 3> i must also be included in the noise cor- 
relation structure. Depending on the amount of correlation, 
normalizing only by the variance (as for uncorrelated noise in 
Section^) may be sufficient. Johnstone & Silverman (1997) 
show that SureShrink still works well under mild conditions 
on the correlation structure. They show that long range cor- 
relations in the data are reduced by the wavelet transform, 
that there is almost no correlation between coefficients from 
different scales, and that Stein's MSE estimate is still unbi- 
ased. 



where \ij, m is the mean of the coefficient 7 j, m . However, the 
means Hj, m are unknown and therefore so is the MSE @. 
To overcome this difficulty SureShrink uses Stein's unbiased 
estimate of the MSE (see Donoho 1992) . SureShrink selects 
a threshold Tj at each scale j that is the best, for rules of 
the form (^j), in the sense that it minimizes Stein's estimate 
of the MSE ([HJ). The computational cost at each level is 
0(kj\og(kj)). Denoising a map with gaps is done in exactly 
the same way but using only those coefficients based on pix- 
els outside the gaps (see Appendix). 

SureShrink wavelet thresholding is asymptotically mini- 
max (Donoho & Johnstone 1995), i.e., it minimizes the worst 



4 ESTIMATION AND REMOVAL OF POINT 
SOURCES WITH WAVELETS 

We now use the denoising procedure to reduce localized 
source contamination in sky maps. Here T S: i and ni are as- 
sumed to have zero mean and to be uncorrelated, while Fi is 
fixed and unknown. The goal is to estimate the foreground 
source field F. Since the noise and the cosmological model 
have zero mean (we assume that the CMB monopole has 
been subtracted), uncontaminated denoised wavelet coeffi- 
cients should be close to zero. Our goal is to find an estimate 
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Figure 2. Estimating a source field (top left) from a noisy source 
field (top right) by thresholding Haar (bottom left) or Daubechies 
(bottom right) wavelet coefficients. All sources have identical 
signal, and the s/n is 3, where s/n is defined as (peak ampli- 
tude) /(RMS noise). Daubechies wavelets yield higher s/n esti- 
mates of the source location and shape. 







F of the foreground field F by thresholding the wavelet co- 
efficients. We then use a criterion to identify sources in the 
field. We suggest masking pixels in identified locations. 

To estimate the source field we normalize the coeffi- 
cients using the matrix A defined by Equation (]§]). To esti- 
mate the variability of the wavelet coefficients at each reso- 
lution, i.e., the diagonal elements of A, we use the median 
absolute deviation. This measure is resistant to outliers and 
thus less affected by point sources. Once coefficients have 
been normalized they are denoised using SureShrink. By as- 
sumption, the means in Eq. satisfy flj, m = for all jj, m 
not contaminated by sources. 

Using SHW on the whole sky has some drawbacks. 
First, SHW are not smooth and therefore do not provide 
good estimates of smooth peaks of localized sources. (Local- 
ized sources are at least as broad and smooth as the beam of 
the experiment.) Also, since local sources are sparse they are 
difficult to detect when considering the entire sky. For noise 
dominated data Su reShrink uses a large threshold propor- 
tional to y/\og(kj). These two problems are alleviated by 
taking smaller sky sections where smooth planar wavelets 
can be used. Patches are easily defined through the map 
pixelization by taking lower resolution pixels as patches. 
On each patch we use tensor products of one-dimensional 
Daubechies wavelets. (We found that better results were ob- 
tained with four vanishing moments, and that other wavelet 
bases do not lead to significantly different results from those 
obtained with Daubechies wavelets.) Note that SHW on 
small patches reduce to tensor products of Haar wavelets. 
Figure |^ shows the original (six sources of the same ampli- 
tude) and original plus white noise (s/n = 3, where s/n is the 
ratio of the peak amplitude to the noise RMS) source fields 
on a patch, and thresholded estimates using planar Haar 
and Daubechies wavelets. Daubechies wavelets recover the 
peaks of the sources better than Haar wavelets. Hence we 
use Daubechies wavelets for point source identification and 



removal. Note also that source estimates are very close to 
flat wherever there are no sources. This minimizes potential 
systematic effects arising from false source identification. 

4.1 Single Source in White Noise 

To learn about the denoising process we first apply the pro- 
cedure to a single source buried in white noise. Table | (left) 
shows the relative error in reconstructing the source. We cal- 
culate RMS(F - F)/RMS(F) inside the source, where 'in- 
side' is defined as pixels where F > 1% of the noise RMS. 
This measure of relative error is an indication of power in 
the source field not accounted for in the estimated field. 
The s/n is the ratio of the source peak to the noise RMS. 
The size is the ratio between the number of pixels inside 
the source and the total number of pixels in the patch in 
percent. The first and second rows correspond to Haar and 
Daubechies wavelets, respectively. In all cases Daubechies 
wavelets yield smaller relative errors. In Table || (right) we 
quantify to what extent the wavelet thresholding affects re- 
gions originally not contaminated by the source. We calcu- 
late the ratio RMS(F — F)/RMS(noise) outside the source. 
The table shows that the ripples introduced in the originally 
uncontaminated region have an amplitude of less than 20% 
of the noise RMS, well within the noise level, and that this 
amplitude is very nearly constant with source size. 

4.2 Spectrum of Source Intensity 

We will now attempt to identify and remove a more realistic 
distribution of sources from a map. We generate a source 
field using a power-law 

-2.5 



dN(s) oc s ' ds, 



(11) 



where N is the number of sources and s is signal-to-noise ra- 
tio. (As before, s/n is the ratio of peak amplitude and noise 
RMS.) Source centers are randomly distributed on the patch 
and only sources with s > 1 are included. The power-law 
exponent, —2.5, is appropriate for a population of sources 
with a single luminosity distributed randomly through an 
infinite, flat, euclidean space. The total number of sources 
in the patch is determined by the choice of different ratios 
(total area inside sources)/(total patch area), where 'inside' 
is defined as in the previous section. The noise is assumed to 
be either white or characterized by a Gaussian correlation 
function with damping scale as a fraction of the maximum 
wavenumber. We used two Gaussian random fields, G\ and 
G2, with damping scales 0.8 and 0.6, respectively. The field 
Gi has correlations of approximately 12%, 2% and 1% be- 
tween first, second and third neighbors, respectively. The 
correlations for the second Gaussian field, G2, are 20%, 3% 
and 1%. Rather than using a particular CMB model, we 
used the Gaussian correlated noise as a generic model for a 
correlated signal. As we will see our results are not sensitive 
to the particular correlation structure assumed. 

To determine source locations in the thresholded map 
we perform a second thresholding on F. This is done by 
comparing pixel absolute magnitudes with fcx(MAD), where 
MAD is the median absolute deviation of F, for different 
values of the threshold k. 

Figures ^, ^ show the results of the point-source iden- 
tification process for sources that were above la and 3a, 
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Figure 3. (a) Proportion of correctly detected source pixels that 
were above la as a function of Area for different thresholds 
fcxMAD. 'Area' is the fraction of pixels covered with sources, 
k is an integer and MAD is the mean absolute deviation of the 
denoised map - see text for more details, (b) Proportion of pixels 
outside the source that were incorrectly identified as source pix- 
els. As a comparison, the dotted lines correspond to the process 
of detecting source pixels by setting a simple 1 a cut on the map. 

(a) (b) 
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Figure 4. (a) Proportion of correctly detected source pixels that 
were above 3a as a function of Area for different thresholds 
fcxMAD. 'Area' is the fraction of pixels covered with sources, 
k is an integer and MAD is the mean absolute deviation of the 
denoised map - see text for more details, (b) Proportion of pixels 
outside the source that were incorrectly identified as source pix- 
els. As a comparison, the dotted lines correspond to the process 
of detecting source pixels by setting a simple 3 a cut on the map. 



(a) (b) 




5 10 15 20 25 5 10 15 20 25 

Area (%) Area (%) 

respectively. For each figure, plate (a) shows the proportion 
of correctly identified source pixels for k = 2, 3, 5 and plate 
(b) shows the proportion of pixels outside sources that were 
incorrectly identified as source pixels, for the same k val- 
ues. The area is defined as the proportion of source pixels 
above la. As k increases the probability of correct detection 
decreases as does that of incorrect detections. The results 
in Figures ^, ^| are for sources in white noise but no signifi- 
cant difference was observed with Gaussian correlated noise. 
Even a Gaussian field with damping scale 0.3 did not show 
significant differences. 

The figures show that wavelets are extremely effective 
in identifying point sources. When we are looking for sources 
of moderate s/n (Figure ^) which cover up to ~ 5% of the 
patch area, virtually all source pixels are identified regard- 
less of the value of k. For low s/n sources (Figure ^), a crite- 
rion of k < 3 identifies more than 80% of the source pixels. 
Comparing Figures ^ and ^ we see that for wavelet thresh- 
olding the proportion of false detections does not change 
much with the s/n of the source being searched for. In addi- 
tion, simulations show that when there are no sources in the 
patch only about 1% of the pixels in the patch are incorrectly 



identified as sources. To reduce the number of 'incorrect de- 
tections' over the entire map, which presumably consists of 
many patches, we may choose not to subject every patch 
to this source cleaning procedure. Rather, it may be advan- 
tageous to first calculate the value of Zj for the patch, as 
described in Section and then proceed with the source 
cleaning procedure only on patches with abnormally large 
values of Zj at some scale j. Further analysis and simula- 
tions are necessary to optimize the method. 

One can compare the performance of wavelets in iden- 
tifying point sources to the performance achieved by a num- 
ber of alternative techniques. A detailed study is in progress. 
Here we only compare with the simple practice of removing 
map pixels whose amplitudes are larger than ka, where a 
is the RMS of the noisy source field. The dotted lines in 
Figures ^, |i| correspond to a la and 3a cuts, respectively. 
Wavelets achieve a significantly higher proportion of correct 
detections, and less errors are made when searching for low 
s/n sources. Interestingly, the ka cut has fewer errors when 
searching for high s/n sources. That suggests that the op- 
timal point source identification procedure may involve a 
combination of techniques, with wavelets being the leading 
tool. 



5 MAP DENOISING AND COMPRESSION 

In this section we apply wavelet denoising with the goal of 
compressing information and reducing noise for visualisation 
of CMB maps. Denoised maps are useful for displaying or 
transmitting compressed map files. 

Here we assume that the CMB sky, T 3 , is fixed and 
that the random field we are trying to isolate and remove 
is the instrumental noise. Hence, only the noise variability 
is included in the matrix A (Eq. ^|), and it is estimated us- 
ing only the wavelet coefficients at the highest resolution 
where we expect little structure. Alternatively, A could be 
estimated using the difference of two independent maps (as 
with the A — B DMR maps). This time SureShrink deter- 
mines the threshold Tj that minimizes ( |ici| ) with fJ,j, m being 
the unknown wavelet coefficients of the fixed sky T s . The 
denoised estimate of T s is then W -1 <5(7), the inverse trans- 
form of the thresholded coefficients. The computational cost 
is 0(N\og(N)). Note also that the estimate of the underly- 
ing temperature field achieved in this process is model in- 
dependent. In other words, it did not require a cosmological 
model. 

In other work Wiener filtering has been used for de- 
noising CMB maps (Bunn et al. 1996). The Wiener filter 
estimate of T s is Cd, where the matrix C is determined 
by assuming a cosmological model for T s and minimizing 
(||Cd — T s || 2 ). The process takes 0(N 3 ) operations and 
depends on the fiducial model chosen. Thus, while Wiener 
filtering provides a posterior mean estimate given a prior 
cosmological model, wavelet thresholding provides an esti- 
mate of the actual realization of the unknown field without 
assuming a cosmological model. The Wiener filter can be de- 
fined as the maximum-probability signal contribution given 
the data, the underlying signal power spectrum and noise 
correlation, and assuming Gaussianity for both signal and 
noise. SureShrink wavelet thresholding, on the other hand, 
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Table 2. Left: Relative error RMS(F — F)/RMS(F) in estimating a field with a single source 
embedded in white noise. The relative error is calculated using only pixels inside the source, 
where 'inside' is defined in the text. The s/n is the ratio of the source peak to the noise RMS. 
The size is the percentage of pixels of the patch inside the source. The first and second rows 
correspond to Haar and Daubechies wavelets respectively. Right: the ratio of the residual 
RMS to the noise RMS outside the source. 
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Table 3. Noise reduction and compression achieved by 
SureShrink denoising of a standard CDM model for different sig- 
nal to noise ratios. 

signal/noise RMS(residuals)/RMS(noise) % coefficients used 



.82 
.65 
.57 



33 
14 
7 



is aymptotically minimax (see Section 3.1 and Donoho & 
Johnstone 1995). 

Figure [] shows the denoising of a simulated standard 
CDM map with Q, b = .05, f2 = 1, H = 50 km/s/Mpc. The 
signal to noise in the signal plus noise map (middle) is ~ .7, 
where the signal to noise is the ratio of the noise RMS to the 
signal RMS. There are 98304 pixels in the map. The denoised 
map (bottom) achieved a 35% noise reduction, as measured 
by the ratio RMS (true map - denoised map) /(noise RMS), 
using only 14% of the 98304 wavelet coefficients. It also has 
only 14% of the original number of pixels, but pixels have a 
variety of sizes, adapted to the local structure as identified 
by the thresholding procedure. The entire denoising process 
took less than 6 seconds on an Alpha 500/266 workstation. 
The compression rate that denoising can achieve depends 
on the signal to noise. As it increases more coefficients have 
to be used. Table 3 shows the noise reduction and compres- 
sion rate for different signal to noise ratios using the same 
standard CDM model. 

Wavelet denoising is a flexible procedure in that it al- 
lows one to choose the threshold level which will achieve 
a particular compression rate regardless of the denoising 
achieved. In our case the thresholds were chosen to min- 
imize an estimate of the mean square error (|l^) but this 
may not be the optimal procedure. Depending on the goal 
at hand one may be able to devise a thresholding procedure 
tailored for a particular task. 



6 SHW AND COSMOLOGICAL 
INFORMATION 

It has become traditional to consider the cosmological in- 
formation from CMB datasets in the context of likelihood 
functions, usually expressed in pixels or in the spherical har- 
monic domain. Since no information is lost by transforming 
to wavelet space we can ask whether it is better to work in 



wavelet domain when the goal is to estimate Ces and cos- 
mological parameters. 

Assuming Gaussianity of both the cosmological signal 
and the noise, the log-likelihood of the data given a model 
with covariance matrix M is 



21n£(pi]d) = ln|M| +d t M~ 1 d 

= In |S 7 | + 7 'W~ t lVr 1 W" 1 - 



(12) 



(up to an additive constant) where Pi are the cosmological 
parameters and 7 = Wd is the wavelet transform of the 
data d. M = C + E no i sc is the sum of contributions from 
the signal and the noise covariance. In the second equality 
we have transformed from the map pixel basis to the wavelet 
domain (see also Eq. ^). To estimate the y>; it is enough to 
consider the eigenmodes of C which contribute the most to C 
(this is the Karhunen-Loeve or "signal-to-noise eigenmode" 
transformation, Bond 1995). This simplifies the likelihood by 
compressing the data. The reduced likelihood can be used 
to estimate cosmological parameters. 

Can we apply the same technique on the thresholded 
wavelet coefficients? Because the thresholding procedure is 
nonlinear the distribution of the denoised 7 = 5(7) is no 
longer Gaussian and therefore Eq. ^ written in terms of 
S is no longer the correct log-likelihood. It is computation- 
ally expensive to compute the correct likelihood function of 
the thresholded coefficients for each plausible cosmological 
model. We can still maximize Eq. Q with respect to pi, to 
estimate the parameters but the perfomance of this proce- 
dure is still to be investigated. 

The difficulties due to the non-linearity of the wavelet 
thresholding process can perhaps be resolved by treating the 
process as a linear operation. That is, we consider the thresh- 
olding operation of Eq. ^ as if the thresholds Tj are fixed be- 
forehand. Then, the new wavelets coeffecients <5(7j,m) can 
be considered as linearly related to the old coeffecients: 
^(7.7, "0 ~ a jm7j,m- This is similar to, for example, Wiener 
filtering, where each Fourier mode is modified by some fac- 
tor < ah < 1 (although in that case the process is strictly 
linear since the do not depend upon the data). Thus, 
the denoised wavelet coeffecients can be considered a much 
smaller linear rendition of the original data. The correla- 
tion structure may indeed be much more complicated than 
the raw data, but because there may be many fewer coeffi- 
cients, the 0(N 3 ) operations necessary for manipulating the 
likelihood are much faster. This procedure is under further 
investigation. 
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Figure 5. Top: a simulated, signal only, CMB map based on a 
standard CDM model. There are about 98000 pixels in the map. 
Middle: the signal map with noise added. The s/n in this map 
is .7. Bottom: The noisy map after denoising with SureShrink. 
Only 14% of the wavelet coefficients were used, and the denoising 
process took less than 6 seconds on an Alpha 500/266 worksta- 
tion. 




Since working with the denoised coefficients seems prob- 
lematic one could use the likelihood without denoising the 
coefficients. But then nothing seems to be gained by working 
in wavelet domain since the quadratic term in ( |l^ ) does not 
split into independent components of different scales — the 
second equality above is not easier to compute than its ver- 
sion in the pixel domain. Consider, for example, a quadratic 
estimator AjEAj of Ce (e.g., Tegmark 1997; Bond, Jaffe & 
Knox 1998). Since the wavelet transform acts as a band-pass 
filter, one might hope to reduce leakage and decrease compu- 
tational cost by considering only high resolution coefficients 
to estimate small-angular-scale (large £) Ce, splitting AjEAj 
as in (0) and taking only high resolution coefficients. There 



Figure 6. Histograms of the 24 statistic for the same kind of 
simulations as in Figure]]] but with spectral index n = 0.5 (instead 
of n = 1), where n is the spectral index of the primordial desnity 
fluctuation power spectrum which in turn gives £(£+l)Ci oc £ n_1 . 
The distribution of 24 under this model is clearly different from 
that of the n = 1 model (Figure [lj) . The dashed line indicates the 
value of 24 for the DMR 53+90 GHz map. 

250 1 1 1 1 1 1 1 1 1 



200 - 




are three problems: (1) Spherical harmonics have compo- 
nents in the lower wavelet scales (Table 1) that need to be 
estimated; (2) Wavelet coefficients Xj are orthonormal with 
respect to the identity but not with respect to any matrix E. 
In general the cross terms in a quadratic estimator using a 
decomposition of the form (JH|) do not cancel; (3) The number 
of coefficients at each scale grows exponentially; using only 
the highest resolutions yields little compression. For exam- 
ple, 93% of the coefficients of a resolution 10 map belong to 
the two highest resolutions. Because of these problems, the 
estimator (like the likelihood function itself above) does not 
appear to be any easier to compute in the wavelet domain. 

Another approach is to consider using the wavelet coef- 
ficients themselves as intermediate parameters in the process 
of cosmological parameter estimation. A homogeneous ran- 
dom field is characterized by its power spectrum Ce, which 
is in turn parametrized by the cosmological parameters. In 
principle cosmological parameters can be estimated by fit- 
ting to the C'i . Power in the wavelet domain provides some 
information about cosmological parameters. For example, 
Figure ^ shows the distribution of the normalized power at 
resolution 4 (24) for the same large-angular-scale models of 
Figure [j] but now using n = 0.5 (instead of n = 1), where 
n is the spectral index of the primordial density fluctua- 
tion power spectrum which in turn gives £(£ + l)Ce oc I" -1 . 
Comparing this histogram to the one in Figure [l] it is obvious 
that the distribution of 24 can discriminate between the two 
models. However, even future high resolution CMB maps 
will only be of resolutions j < 10, and only the highest reso- 
lutions j > 7 will be most useful for extracting cosmological 
parameters . Using the wavelet power at scales j = 7, 8, 9, 10 
will not yield enough degrees of freedom to fit 11 cosmolog- 
ical parameters. This paucity of information in the wavelet 
domain is because the number of independent wavelet power 
coefficients (the Zjs) scales only as the logarithm of the num- 
ber of pixels. In contrast the number of independent power 
spectrum coeffecients Ce (which completely determine the 
properties of the homogeneous and isotropic Gaussian CMB 
field) scales as the square root of the number of pixels. Us- 
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ing the wavelet power as we have defined it here will not be 
sufficient. 

We could instead consider abandoning both Ci and the 
wavelet power defined here in exchange for a spectrum more 
localized in both location and scale. The question is whether 
we can find a spectral representation of homogeneous ran- 
dom fields on the sphere in terms of localized functions. 
However, it is easy to show that there is no smooth basis 
with local support decreasing with j, with the prop- 
erty that any homogeneous random field on the sphere can 
be written as J^. onj^ij where (ctij, £V,j') = a^5\ <$j' . 
That is, any other "power spectrum" we define in terms of 
locally supported functions will have an extended "window 
function" in I, as with the wavelet coeffecients of Section ^| 
the usual Ci spectrum is still the most natural choice. 



7 SUMMARY 

Wavelets provide a useful tool to investigate local structure 
in maps. We used SHW to define a local measure of power 
that is ennivalent to a. normalized local angular scale df 



Bayesian thresholding methods that can be used to include 
more informative prior information on the wavelet coeffi- 
cients. See Abramovich et aZ. (1997) and references therein. 
The development of wavelet tools in statistics is a very ac- 
tive field of research. For a review see Silverman (1999) as 
well as the other articles in the same volume. 

The computer code used for work presented in this pa- 
per will soon be freely available a t 
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positio n of the map RMS. This measure can be used to com- 
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pare power in different regions in the sky. Similar measures 
can be defined with other bases of orthogonal wavelets. SHW 
have the advantage of being easily implemented given any 
map pixelization scheme. SHW can also be used to denoise 
and compress maps without doing any diagonalization or in- 
version of matrices. We denoised and compressed a 100,000 
pixel CMB map by a factor of ~ 10 in 5 seconds achieving a 
noise reduction of about 35%. In contrast to Wiener filter- 
ing the compression process is model independent. The total 
computational cost of wavelet transforming and threshold- 
ing a region with N pixels is 0(N\og(N)). 

Since wavelets are locally supported they can be used 
to reduce localized source contamination in maps. We ap- 
plied planar Daubechies wavelets for the identification and 
removal of points sources from small planar-like sections of 
sky maps. Our technique successfully identified most point 
sources which are above 3<r and more than 80% of those 
above la. In addition, wavelet thresholded estimates of 
source fields introduce little structure in uncontaminated re- 
gions. 

We have concentrated on local source detection and 
map compression. We did not find useful direct applica- 
tions of spherical wavelets to the estimation of angular power 
spectra or cosmological parameters. Ideally we would like to 
have a wavelet basis which approximately diagonalizes the 
covariance matrix of the comological model as well as the 
noise covariance matrix, thus simplifying the likelihood func- 
tion. This is not the case for SHW. But in the future there 
may be hope for spherical wavelets that are more compatible 
with spherical harmonics. See for example Freeden & Wind- 
heuser (1997), and references therein. They define smooth 
spherical wavelets that combine a spherical harmonic expan- 
sion for the low order terms and wavelet expansions for the 
small scale structure. However, they do not yet have a fast 
wavelet transform. 

In Section [0] we pointed out that SureShrink wavelet 
thresholding provides posterior estimates for a priori distri- 
butions of the wavelet coefficients that give the largest mean 
square error. Although we have not used them there are also 



APPENDIX A: HAAR WAVELETS FOR THE 
COBE PIXELIZATION 

Spherical Haar wavelets (Sweldens 1995, Schroeder & 
Sweldens 1995) are a generalization of planar Haar wavelets. 
To define SHW we need a hierarchical pixelization scheme. 
One example of such is the COBE sky cube (CSC) pix- 
elization where the surface of the unit sphere is divided 
into 6 ■ 4' J_1 - ) approximately equal area pixels at resolu- 
tion j. Each pixel k at resolution j, Sj t k, has four children, 
Sj+i^ , Sj+i,fc4, at resolution j + Let K(j) be the pixel 
numbers corresponding to resolution j. 

The functions that model the main features of the data 
at each scale are scaling functions, and the functions that 
represent the details of the data not captured by the scal- 
ing functions are the wavelets. The scaling functions of the 
SHW are defined as <Pj,k{w) = 1, if 77 G Sj,k an d other- 
wise. Define Vj C L2 as the closed subspace generated by 
the fj,k'- Vj = clos-fyj^fc I k G K(j)}. For example, a DMR 
sky map is an element of V&. By definition Vj C Vj+\. The 
SHW are defined as an orthogonal basis for the orthogo- 
nal complement, Wj, of Vj in Vj+i- By construction, for 
any chosen J, Vj = Vj © i= ■ W%. For example, a resolu- 
tion 10 map is a sum of a coarser resolution 9 map plus 
a function from Wg encoding the details not captured by 
the coarser map, or a resolution 8 map plus detail functions 
from Ws and Wg. The functions that capture the leftover 
details are combinations of the wavelets at different reso- 
lutions: there are three wavelets, ipj,m 1 ipj,m2>'<Pj,m 3 , associ- 
ated to each pixel Sj,k- Let \Xj^ be the area of pixel Sj t u and 
L{j,k) — {m (k), mi(fc), 7712(7*), m^(k)} be the pixel num- 
bers of its four children, then the three wavelets for pixel 
Sj,k are 

fj+l,m + ^j+l.mz <Pj + l, mi + ( Pj+l,m 2 



2/ij + l,m + 2/ij + 

l,mg 



2,i 



j-\-l,m 



1 + V 



■j+l,m 2 



. m o 
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2/ij + l,m 2//j + 1 ,mQ 



The wavelets {V'i.mffcll i = 1,2,3; fc £ ^"(j)} form a 
basis for Wj. In the limit, for any choice of J , {ipj,m,j > 
jo} U {</>i ,fc| k £ K(jo)} is an orthogonal basis for the space 
of integrable functions on the sphere. For example, take a 
function T (e.g., a sky map) at a finest resolution level J 
(i.e., T £ Vj) and choose a starting resolution J D , then T 
can be written as 



T(i) 



fceir(jo) 



j-i 

Aio.fc^io.fcW + / J / j7j,mV'j,m(i). 



(Al) 



The first sum corresponds to the lowest resolution map and 
the second one to the details from the higher resolutions. 
The coefficients in the expansion (Al) define the wavelet 
transform WT of T 



WT = 7(Aj , jjo, •■■ , tj-i) 

= {{X Jo ,k, k 6 K(J )}, {y j>m , 
meM(j),J <j<J-l}y 



(A2) 



No linear system needs to be solved in order to compute 
the components of 7, they are obtained recursively starting 
from the finest resolution coefficients Aj = T 



A 



3,k 



lEKU+l) 



hj,k,l^j+i,l, and jj >r , 



y " !lj,m,i.Xj ■ 1 



To reconstruct a function given the wavelet coefficients we 
have a recursive inverse transform 

\> + l,i = ^ hj,k,l^j,k + ^2 9j.m,r/j,m- 
keK(j) meM(j) 

The coefficients for the forward and inverse transforms de- 
fined above are 



otherwise 



1998, J. Roy. 
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1 if z e L(j, k) 

otherwise 



[„ i>j,m dfi HI £ L(j, k) 
otherwise 

otherwise, 



where tpj,m is the dual of ipj, m - 

In order to avoid biasing analyses of CMB skies, data 
contaminated by Galactic emissions or other identified fore- 
ground sources are usually discarded. To compute a wavelet 
transform of data with gaps, we discard those pixels at the 
lowest resolution J which are inside the gaps. Only the 
wavelet coefficients correponding to descendants of pixels 
outside the gaps are used. Note that the scaling and wavelet 
functions corresponding to these selected pixels form an or- 
thogonal basis for the sky with gaps. This is due to the local 
support of the functions and to the vanishing of the integral 
of ipj.m over any pixel S 3 \k- 
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